Search CORE

20 research outputs found

Depth First Exploration of a Configuration Model

Author: Enriquez Nathanaël
Faraud Gabriel
Ménard Laurent
Noiry Nathan
Publication venue
Publication date: 22/11/2019
Field of study

We introduce an algorithm that constructs a random uniform graph with prescribed degree sequence together with a depth first exploration of it. In the so-called supercritical regime where the graph contains a giant component, we prove that the renormalized contour process of the Depth First Search Tree has a deterministic limiting profile that we identify. The proof goes through a detailed analysis of the evolution of the empirical degree distribution of unexplored vertices. This evolution is driven by an infinite system of differential equations which has a unique and explicit solution. As a byproduct, we deduce the existence of a macroscopic simple path and get a lower bound on its length.Comment: 30 page

arXiv.org e-Print Archive

A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification

Author: Colombo Pierre
Noiry Nathan
Piantanida Pablo
Staerman Guillaume
Publication venue
Publication date: 21/10/2023
Field of study

One of the pursued objectives of deep learning is to provide tools that learn abstract representations of reality from the observation of multiple contextual situations. More precisely, one wishes to extract disentangled representations which are (i) low dimensional and (ii) whose components are independent and correspond to concepts capturing the essence of the objects under consideration (Locatello et al., 2019b). One step towards this ambitious project consists in learning disentangled representations with respect to a predefined (sensitive) attribute, e.g., the gender or age of the writer. Perhaps one of the main application for such disentangled representations is fair classification. Existing methods extract the last layer of a neural network trained with a loss that is composed of a cross-entropy objective and a disentanglement regularizer. In this work, we adopt an information-theoretic view of this problem which motivates a novel family of regularizers that minimizes the mutual information between the latent representation and the sensitive attribute conditional to the target. The resulting set of losses, called CLINIC, is parameter free and thus, it is easier and faster to train. CLINIC losses are studied through extensive numerical experiments by training over 2k neural networks. We demonstrate that our methods offer a better disentanglement/accuracy trade-off than previous techniques, and generalize better than training with cross-entropy loss solely provided that the disentanglement task is not too constraining.Comment: Findings AACL 202

arXiv.org e-Print Archive

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

Author: Colombo Pierre
Noiry Nathan
Peyrard Maxime
Piantanida Pablo
West Robert
Publication venue
Publication date: 07/10/2022
Field of study

Automatic evaluation metrics capable of replacing human judgments are critical to allowing fast development of new methods. Thus, numerous research efforts have focused on crafting such metrics. In this work, we take a step back and analyze recent progress by comparing the body of existing automatic metrics and human metrics altogether. As metrics are used based on how they rank systems, we compare metrics in the space of system rankings. Our extensive statistical analysis reveals surprising findings: automatic metrics -- old and new -- are much more similar to each other than to humans. Automatic metrics are not complementary and rank systems similarly. Strikingly, human metrics predict each other much better than the combination of all automatic metrics used to predict a human metric. It is surprising because human metrics are often designed to be independent, to capture different aspects of quality, e.g. content fidelity or readability. We provide a discussion of these findings and recommendations for future work in the field of evaluation

arXiv.org e-Print Archive

Online Matching in Geometric Random Graphs

Author: Lerasle Matthieu
Ménard Laurent
Noiry Nathan
Perchet Vianney
Sentenac Flore
Publication venue
Publication date: 05/10/2023
Field of study

We investigate online maximum cardinality matching, a central problem in ad allocation. In this problem, users are revealed sequentially, and each new user can be paired with any previously unmatched campaign that it is compatible with. Despite the limited theoretical guarantees, the greedy algorithm, which matches incoming users with any available campaign, exhibits outstanding performance in practice. Some theoretical support for this practical success was established in specific classes of graphs, where the connections between different vertices lack strong correlations - an assumption not always valid. To bridge this gap, we focus on the following model: both users and campaigns are represented as points uniformly distributed in the interval

[0,1]

, and a user is eligible to be paired with a campaign if they are similar enough, i.e. the distance between their respective points is less than

c/N

, with

c>0

a model parameter. As a benchmark, we determine the size of the optimal offline matching in these bipartite random geometric graphs. In the online setting and investigate the number of matches made by the online algorithm closest, which greedily pairs incoming points with their nearest available neighbors. We demonstrate that the algorithm's performance can be compared to its fluid limit, which is characterized as the solution to a specific partial differential equation (PDE). From this PDE solution, we can compute the competitive ratio of closest, and our computations reveal that it remains significantly better than its worst-case guarantee. This model turns out to be related to the online minimum cost matching problem, and we can extend the results to refine certain findings in that area of research. Specifically, we determine the exact asymptotic cost of closest in the

\epsilon

-excess regime, providing a more accurate estimate than the previously known loose upper bound

arXiv.org e-Print Archive

A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection

Author: Colombo Pierre
Dadalto Eduardo
Noiry Nathan
Piantanida Pablo
Staerman Guillaume
Publication venue
Publication date: 06/06/2023
Field of study

A key feature of out-of-distribution (OOD) detection is to exploit a trained neural network by extracting statistical patterns and relationships through the multi-layer classifier to detect shifts in the expected input data distribution. Despite achieving solid results, several state-of-the-art methods rely on the penultimate or last layer outputs only, leaving behind valuable information for OOD detection. Methods that explore the multiple layers either require a special architecture or a supervised objective to do so. This work adopts an original approach based on a functional view of the network that exploits the sample's trajectories through the various layers and their statistical dependencies. It goes beyond multivariate features aggregation and introduces a baseline rooted in functional anomaly detection. In this new framework, OOD detection translates into detecting samples whose trajectories differ from the typical behavior characterized by the training set. We validate our method and empirically demonstrate its effectiveness in OOD detection compared to strong state-of-the-art baselines on computer vision benchmarks

arXiv.org e-Print Archive